87 research outputs found
DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks
Non-contact video-based physiological measurement has many applications in
health care and human-computer interaction. Practical applications require
measurements to be accurate even in the presence of large head rotations. We
propose the first end-to-end system for video-based measurement of heart and
breathing rate using a deep convolutional network. The system features a new
motion representation based on a skin reflection model and a new attention
mechanism using appearance information to guide motion estimation, both of
which enable robust measurement under heterogeneous lighting and major motions.
Our approach significantly outperforms all current state-of-the-art methods on
both RGB and infrared video datasets. Furthermore, it allows spatial-temporal
distributions of physiological signals to be visualized via the attention
mechanism.Comment: Accepted paper at ECCV 2018. 16 pages, 3 figures, supplementary
materials in the ancillary file
Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards
As people learn to navigate the world, autonomic nervous system (e.g., "fight
or flight") responses provide intrinsic feedback about the potential
consequence of action choices (e.g., becoming nervous when close to a cliff
edge or driving fast around a bend.) Physiological changes are correlated with
these biological preparations to protect one-self from danger. We present a
novel approach to reinforcement learning that leverages a task-independent
intrinsic reward function trained on peripheral pulse measurements that are
correlated with human autonomic nervous system responses. Our hypothesis is
that such reward functions can circumvent the challenges associated with sparse
and skewed rewards in reinforcement learning settings and can help improve
sample efficiency. We test this in a simulated driving environment and show
that it can increase the speed of learning and reduce the number of collisions
during the learning stage
Identifying Bias in AI using Simulation
Machine learned models exhibit bias, often because the datasets used to train
them are biased. This presents a serious problem for the deployment of such
technology, as the resulting models might perform poorly on populations that
are minorities within the training set and ultimately present higher risks to
them. We propose to use high-fidelity computer simulations to interrogate and
diagnose biases within ML classifiers. We present a framework that leverages
Bayesian parameter search to efficiently characterize the high dimensional
feature space and more quickly identify weakness in performance. We apply our
approach to an example domain, face detection, and show that it can be used to
help identify demographic biases in commercial face application programming
interfaces (APIs)
DeepMag: Source Specific Motion Magnification Using Gradient Ascent
Many important physical phenomena involve subtle signals that are difficult
to observe with the unaided eye, yet visualizing them can be very informative.
Current motion magnification techniques can reveal these small temporal
variations in video, but require precise prior knowledge about the target
signal, and cannot deal with interference motions at a similar frequency. We
present DeepMag an end-to-end deep neural video-processing framework based on
gradient ascent that enables automated magnification of subtle color and motion
signals from a specific source, even in the presence of large motions of
various velocities. While the approach is generalizable, the advantages of
DeepMag are highlighted via the task of video-based physiological
visualization. Through systematic quantitative and qualitative evaluation of
the approach on videos with different levels of head motion, we compare the
magnification of pulse and respiration to existing state-of-the-art methods.
Our method produces magnified videos with substantially fewer artifacts and
blurring whilst magnifying the physiological changes by a similar degree.Comment: 24 pages, 13 figure
A Scalable Approach for Facial Action Unit Classifier Training UsingNoisy Data for Pre-Training
Machine learning systems are being used to automate many types of laborious
labeling tasks. Facial actioncoding is an example of such a labeling task that
requires copious amounts of time and a beyond average level of human domain
expertise. In recent years, the use of end-to-end deep neural networks has led
to significant improvements in action unit recognition performance and many
network architectures have been proposed. Do the more complex deep neural
network(DNN) architectures perform sufficiently well to justify the additional
complexity? We show that pre-training on a large diverse set of noisy data can
result in even a simple CNN model improving over the current state-of-the-art
DNN architectures.The average F1-score achieved with our proposed method on the
DISFA dataset is 0.60, compared to a previous state-of-the-art of 0.57.
Additionally, we show how the number of subjects and number of images used for
pre-training impacts the model performance. The approach that we have outlined
is open-source, highly scalable, and not dependent on the model architecture.
We release the code and data: https://github.com/facialactionpretrain/facs
Do Facial Expressions Predict Ad Sharing? A Large-Scale Observational Study
People often share news and information with their social connections, but
why do some advertisements get shared more than others? A large-scale test
examines whether facial responses predict sharing. Facial expressions play a
key role in emotional expression. Using scalable automated facial coding
algorithms, we quantify the facial expressions of thousands of individuals in
response to hundreds of advertisements. Results suggest that not all emotions
expressed during viewing increase sharing, and that the relationship between
emotion and transmission is more complex than mere valence alone. Facial
actions linked to positive emotions (i.e., smiles) were associated with
increased sharing. But while some actions associated with negative emotion
(e.g., lip depressor, associated with sadness) were linked to decreased
sharing, others (i.e., nose wrinkles, associated with disgust) were linked to
increased sharing. The ability to quickly collect facial responses at scale in
peoples' natural environment has important implications for marketers and opens
up a range of avenues for further research.Comment: 33 page
iPhys: An Open Non-Contact Imaging-Based Physiological Measurement Toolbox
Imaging-based, non-contact measurement of physiology (including imaging
photoplethysmography and imaging ballistocardiography) is a growing field of
research. There are several strengths of imaging methods that make them
attractive. They remove the need for uncomfortable contact sensors and can
enable spatial and concomitant measurement from a single sensor. Furthermore,
cameras are ubiquitous and often low-cost solutions for sensing. Open source
toolboxes help accelerate the progress of research by providing a means to
compare new approaches against standard implementations of the
state-of-the-art. We present an open source imaging-based physiological
measurement toolbox with implementations of many of the most frequently
employed computational methods. We hope that this toolbox will contribute to
the advancement of non-contact physiological sensing methods
A Multimodal Emotion Sensing Platform for Building Emotion-Aware Applications
Humans use a host of signals to infer the emotional state of others. In
general, computer systems that leverage signals from multiple modalities will
be more robust and accurate in the same task. We present a multimodal affect
and context sensing platform. The system is composed of video, audio and
application analysis pipelines that leverage ubiquitous sensors (camera and
microphone) to log and broadcast emotion data in real-time. The platform is
designed to enable easy prototyping of novel computer interfaces that sense,
respond and adapt to human emotion. This paper describes the different audio,
visual and application processing components and explains how the data is
stored and/or broadcast for other applications to consume. We hope that this
platform helps advance the state-of-the-art in affective computing by enabling
development of novel human-computer interfaces
Modeling Affect-based Intrinsic Rewards for Exploration and Learning
Positive affect has been linked to increased interest, curiosity and
satisfaction in human learning. In reinforcement learning, extrinsic rewards
are often sparse and difficult to define, intrinsically motivated learning can
help address these challenges. We argue that positive affect is an important
intrinsic reward that effectively helps drive exploration that is useful in
gathering experiences. We present a novel approach leveraging a
task-independent reward function trained on spontaneous smile behavior that
reflects the intrinsic reward of positive affect. To evaluate our approach we
trained several downstream computer vision tasks on data collected with our
policy and several baseline methods. We show that the policy based on our
affective rewards successfully increases the duration of episodes, the area
explored and reduces collisions. The impact is the increased speed of learning
for several downstream computer vision tasks
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
Generative adversarial networks have led to significant advances in
cross-modal/domain translation. However, typically these networks are designed
for a specific task (e.g., dialogue generation or image synthesis, but not
both). We present a unified model, M3D-GAN, that can translate across a wide
range of modalities (e.g., text, image, and speech) and domains (e.g.,
attributes in images or emotions in speech). Our model consists of modality
subnets that convert data from different modalities into unified
representations, and a unified computing body where data from different
modalities share the same network architecture. We introduce a universal
attention module that is jointly trained with the whole network and learns to
encode a large range of domain information into a highly structured latent
space. We use this to control synthesis in novel ways, such as producing
diverse realistic pictures from a sketch or varying the emotion of synthesized
speech. We evaluate our approach on extensive benchmark tasks, including
image-to-image, text-to-image, image captioning, text-to-speech, speech
recognition, and machine translation. Our results show state-of-the-art
performance on some of the tasks
- …